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ABSTRACT 

This  paper  studies  prediction  of  future  failure  (rates)  by  hierarchical  empirical  Bayes  (EB)  Poisson 
regression  methodologies.  Both  a  gamma  distributed  superpopulation  as  well  as  a  more  robust 
(long-tailed)  log  student-t  superpopulation  are  considered.  Simulation  results  are  reported 
concerning  predicted  Poisson  rates.  The  results  tentatively  suggest  that  a  hierarchical  model  with 
gamma  superpopulation  can  effectively  adapt  to  data  coming  from  a  log-Studcnt-t  superpopulation 
particularly  if  the  additional  computation  involved  with  estimation  for  the  log-Student-t 
hierarchical  model  is  burdensome. 


1.      INTRODUCTION 

The  following  model  often  provides  a  useful  place  from  which  to 
commence  the  analysis  of  point  event  process  data.  First,  suppose  there  is  a 
set  of  I  entities  or  units,  each  of  which  generates  an  observed  history  of  point 
events.  Take  each  describing  point  process  to  be  homogeneous  Poisson  (\\),  i 
=  1,  2,  3,  ...,  I.  The  observed  data  appears  as  (sj,tj),  Sj  being  the  number  of 
events  for  process  i  over  active  or  operating  time  tj.  Also  observed  are  certain 
fixed  explanatory  variable  values;  xjj,  j  =  1,  2,  ...,  p,  associated  with  Xj.  In  some 
literature,  e.g.,  Everitt  (1984),  such  variables  are  called  manifest.  Second, 
there  is  a  latent  quantity,  5j,  associated  with  X\,  that  is  unobservable  but 
influences  X\  behavior.    It  is  convenient  to  view  8j,  at  least  provisionally,  as 


being  drawn  randomly  from  some  superpopulation  of  values  and  held  fixed 
thereafter,  thus  endowing  X[  with  its  own  particular  individuality. 

We  call  such  a  setup  hierarchical,  and  ask  it  to  furnish  insights  and 
numbers  concerning  (a)  the  individual  rate  values,  X\,  (b)  the  influence  of  the 
explanatory  variables  upon  these  rates,  and  (c)  the  nature  of  the 
superpopulation  that  gives  rise  to  the  latent  variable  values;  future  values  of 
the  rates,  e.g.,  A-i+i,  etc.,  may  be  viewed  as  coming  from  such  a  population,  at 
least  to  a  first  approximation. 

The  above  model  suggests  itself  for  many  purposes,  one  in  particular 
being  in  risk  analysis,  e.g.,  of  nuclear  power  plant  safety  systems.  Such  setups 
are  also  natural  in  other  reliability-related  areas  as  well,  particularly  in  ones 
arising  in  the  military.  Application  may  perhaps  be  made  to  data  reflecting 
"human  unreliability,"  i.e.,  the  propensity  of  different  individuals  to  make 
errors,  or  experience  accidents. 

The  purpose  of  this  paper  is  to  describe  methods  for  fitting  various 
hierarchical  models  to  the  type  of  data  described.  Particular  attention  is 
devoted  to  the  prediction  problem:  given  the  past  record  of  an  individual 
item  (e.g.,  human  being),  how  well  can  one  predict  its  (her)  future 
performance,  even  if  some  basic  conditions  change? 

2.      THE  FORMAL  MODEL 

The  formulation  proposed  can  be  written  as  follows:    for  i  =  1,2,  ...,  I,  and 

B=(pi,p2,...,pV>T 

8i~IIDg(;fi) 

*i  =  f(xjfi,5i),  (2.1) 

Sj  I  Xi,tj~Ind.  Poiss(Xjtj); 

9_  =  (9i,  02/ ...,  0r)  being  a  parameter  identifying  g,  the  density  associated  with 
the  assumed  fixed  superpopulation.  In  what  follows  we  concentrate  on 
certain  parametric  forms  for  the  link  function  f  and  the  superpopulation  g, 
and  aim  at  estimating  the  H-value  best  representing  the  superpopulation 
giving  rise  to  the  apparent  X.-values.  For  various  reasons,  convenience  and 
tradition  being  influential,  we  restrict  attention  to  the  log-linear  model 

*i  =  f(xi&5j)  =  exp(xiG+6i).  (2.2) 

As  suggested  earlier,  the  objectives  of  the  analysis  will  be  several-fold,  but  an 
important  one  will  be  to  estimate  an  individual  Xj-value,  i.e.,  the  actual 
realization  of  X[  that  prevails.   An  even  more  important  objective  is  to  predict 


the  number  of  future  events  associated  with  i,  Si(t).    This  entails  finding  an 

A  A 

estimate  JL    and    one   for    the    individualizing    parameter   5j,   namely    6j. 
Estimation  will  be  carried  out  by  assuming  that  g(-,0J,  the  superpopulation 
density  giving  rise  to  5j,  is  one  of  a  specific  parametric  family,  and  first 
estimating    the    parameters   of   that   density   along   with    the   regression 

A  A 

parameters.   At  a  later  stage,  the  estimated  parameters  ft  and  9_  are  utilized  to 
create  estimates  of  5j,  and  finally  Xu  see  Cox  and  Hinkley  (1974),  p.  401,  Morris 
(1983),  Deely  and  Lindley  (1981),  etc.   The  several-stage  or  hierarchical  analysis 
is  referred  to  as  parametric  empirical  Bayes  (PEB). 

This  work  is  an  extension  of  Gaver  and  O'Muircheartaigh  (1987)  in  which 
discrepancy-tolerant  (robust)  estimates  of  5j  and  X\  were  produced  and 
evaluated  without  consideration  of  explanatory  variables.  The  major 
purpose  of  the  present  article  is  to  consider  the  effect  of  explanatory  variables 
in  the  context  of  hierarchical  models  using  quite  different  models  for 
superpopulations:  first,  the  simple  conjugate  Gamma,  and  next  the  log- 
Student  t  with  a  small  number  of  degrees  of  freedom  so  that  tails  are 
extended,  and  outliers  more  apt  to  be  generated. 

3.      AN  EMPIRICAL  BAYES  APPROACH 

The  approach  taken  to  providing  estimates  is  traditional;  see  Berger  (1985, 
Chap.  4).  We  first  remove  the  condition  on  8  for  each  item  to  obtain  the 
unconditional  likelihood 

x       '    ,     f(*-M<.(f(x,£,8)tX' 


i=l 


S.! 


The  latter  is  then  maximized  with  respect  to  ft  and  6_  to  produce  B_  and  9_. 
These  quantities  are  then  inserted  into  the  expression  for  the  posterior 
density  of  6, 

*,(*)■  «,(***  ,«.*«)= K,e-'M'S>''  (/(*,&  s)>, )"  *(*; «) 

where  the  constant  Kj  is  a  normalizing  factor.   A  point  estimator  of  X\  is  taken 
to  be  the  posterior  mean  (other  options  of  course  available), 

K  =  I  f(lis)gp(5)d8  (3.2) 


where  xt  is  the  value  of  the  explanatory  variable  for  conditions  anticipated 

when  the  estimate  is  to  be  applied;  if  x{  =  xx  then  we  have  Xu  an  empirical 
Bayes  estimate  of  X\,  for  conditions  under  which  the  data  were  taken;  this  will 
often  be  a  shrunken  estimator  that  has  a  smaller  mean-squared  error  than 

does  a  simple  individual  estimator.      If  xt   refers  to  other  (e.g.,  future) 

conditions  then  A,  calculated  by  (3.2)  may  be  called  the  mean  predictive  rate. 
If  more  information  is  desired  then  the  entire  predictive  distribution  is 
needed: 

m)  =  le->^(f(li8)tfj-tgp(S)dS.  (33) 

this  approximates  the  conditional  probability  of  s,  future  events  for  item  i, 

given  that  it  is  exposed  for  time  tt  and  under  conditions  xt. 

It  is  apparent  that  the  approximation  so  obtained  may  be  under-variable, 

in  that  it  treats  P  and   6  as  fixed  and  known  in  (3.2)  and  (3.3).     The 

hierarchical  Bayes  analysis  described  by  Berger  (1985,  Chap.  4)  is  a  substitute 
that  avoids  that  criticism.  This  defect  is  undeniable,  but  some  appreciation 
for  the  magnitude  of  the  effect  can  be  obtained  by  bootstrapping.  Of  more 
concern  to  us  has  been  investigation  of  the  effect  of  superpopulation  model 
choice:  how  different  can  actual  prediction  be  in  simple  situations  modeled 
quite  differently?  We  proceed  to  compare  and  contrast  two  models,  one 
conjugate  Gamma  and  the  other  longer-tailed  and  hence  outlier-prone. 

4.      GAMMA  LATENT  VARIABLE  POISSON  REGRESSION  (GALVPR). 

It  is  conventional  and  convenient  to  invoke  the  gamma  density  to 
represent  the  random  effect  in  (2.1);  see  Lawless  (1987  a,b)  and  Anscombe 
(1950)  for  examples.   Thus 

8       '  r(a^\     "  (41) 

is  the  superpopulation  model,  from  which 

E[8]  =  l  (4.2) 

Var[b]  =  a. 


Lawless  (1987b)  gives  expressions  for  the  ln-likelihood  and  its  derivatives  for 
this  hierarchical  model.  It  turns  out,  however,  that  a  more  satisfying 
parameterization  is  in  terms  of  9=ln  a  when  the  mle  stage  is  undertaken. 
Since  a  one-parameter  gamma  density  is  used,  the  regression  has  a  constant 
term;  that  is  xji  =1.  For  convenience  we  provide  expressions  for  the  ln- 
likelihood  and  its  derivatives  using  our  parameterization. 
In  the  present  parameterization,  then, 

X  =  Uexp[xifj]  (4.3) 

so  exp(8)  =  U  is  gamma.   In  order  to  form  the  likelihood  element  in  (3.1)  it  is 
only  necessary  to  integrate  to  obtain  the  explicit  form 


V'Y 


1 


V 


yjTch+lj 


(4.4) 


where  cx  =exp|x,-^|.   The  log-likelihood  is 


/(0,g;s,f)  =  2 


[1 7=0 


In  S{ !+  s,- 


lnqfj  -  ln^c.-f,-  +  l)l  -  e~6  ln(eflc,-ri  + 1) 

(4.5) 


«.— i 


where  £ln(1  + /*'**)  =  1  if  s,  =  0. 

;=0 

The  following  derivatives  can  be  obtained. 


de       & 


(*-* 
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H>1  +  Je0 


+  e-2aln 


e^qti  + 1 


qti 


eeciti  + 1 


Sj  +e 


(4.6) 


de2     de 


X  I 


(s,-\     (  ■  \ 


I    0 


\  +  jee  )) 


l\ 


-2e-38ln[e8a  +  ll  +  2-^i£— 


e"c.t.  +  1       e"ctt,  +  1 


ct 


\s.+e 


(4.7) 
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=  1 


i=i 


s,-<^, 
eec,t.+l 


(4.8) 


d2l  '[sje^itj+Citj]^ 


dfodPj 


i-1 


(e6Citi  + 1) 


ijxik 


(4.9) 


where  the  summations  involving  sj-1  are  set  equal  to  zero  when  Si=0. 
Further, 


-d2l 


=  1 


ct 

ii 


1=lVe  C&  +  1, 


xijx*.  ■ 


(4.10) 


A  Newton-like  iterative  procedure  is  used   to  solve   the  system  of 
equations 


dd 


(4.11) 


0  = 


dpk 


(4.12) 


If  Si  is  large  then  evaluating  the  sums  appearing  in  (4.5),  (4.6),  (4.7),  (4.13) 
and  elsewhere  tends  to  be  time-consuming.  However  all  such  sums  are  well- 
behaved  (of  monotonic  formations)  and  can  be  well-approximated  by 
integrals.  This  feature  is  not,  but  easily  can  be,  included  in  our  programs. 

If  (Pk)  were  known,  then  a  Newton  procedure  to  estimate  9  would  be  to 
recursively  solve  the  linear  equation 

d2l 


0  =  -^  =  ^ 

dd    dd 


+ 


0  =  0°    U? 


6  =  0°  (d-6°) 


(4.12) 


n  31 

where  9U  is  a  current  estimate  of  9.  Note  that  if  ^  =  0,  then 
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(4.13) 


Hence,  (4.12)  can  be  rewritten  as 


e-e°  = 


i 

1=1 


Si-1 


/=o1  +  ;e 
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e-26\n\eec4i+ti--g&—\si  +  e0 
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s,-  +  e~ 


(4.14) 

To  obtain  an  initial  estimate  of  9,  note  that,  letting  Nj(ti)  denote  the  ith 
random  variable  of  the  number  of  observed  events, 

E["N<M]  =  cA;  (415) 

Var[Nl{tt)]  =  clt,[l  +  cltlee}.  (4.16) 

Thus,  L(Ni(ti)-Cit,)/VcitiJ  has  mean  0  and  variance  [1  +  Cjtje0].    We  propose 
starting  the  iterative  procedure  to  find  9  by  computing 


m 


-it 

1  1=1 


Vc,0 


(4.17) 


If  m  <  lt  then  a  log-linear  model  is  used  to  describe  the  data.    If  m  >  1,  then 
the  initial  estimate  of  9  is 


00  =  In 


(rh-l)Hc,t, 


-i-i 


(4.18) 


If  9  were  known,  then  (p\)  could  be  estimated  with  generalized  linear 
model  software  in  the  following  manner,   (cf.  McCullagh  and  Nelder   [1983]). 

r)l 
A  Newton  iteration  to  solve  the  equations  0  = is  to  solve  the  system  of 

equations 


( 


where  {i°  is  the  current  estimate  of  (i. 
Put 


d2l 


_*/W>J  (£  =  g°) 


fa-fi) 


(4.19) 


w. 


c,t, 


eeci,+l 


where  q  =  exp{xjji0}. 

Equation  (4.19)  can  be  rewritten  as 
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k=l,...,p 
where 


/-i 


and 


"*  =  »,■**•■ 


(4.20) 


(4.21) 


(4.22) 


(4.23) 


The  equations  of  (4.21)  are  the  normal  equations  for  a  least  squares  regression. 
The  following  is  an  iterative  procedure  to  obtain  estimates  of  0  and  (PiJ 
a)      Fit  a  log-linear  model  stopping  after  one  iteration 
1.      Start  with 


,o 


*,-£    =  In 
2.      Solve  the  equation  (4.21) 
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'     2 
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with 
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S.+- 

'     2 


",;  =  WiXfji 


y>  =  -™, 


r*^ 


\wiJ 


+  wixi£°. 


b)      Find  the  initial  estimate  of  9  by  evaluating  (4.18).   If  m<l,  use  the  log- 
linear  model  of  a)  to  describe  the  data. 

I.       Next  estimate  {f5k):   Evaluate  and  solve  equations  (4.20)  -  (4.23). 


II.     Next  estimate  9:   Evaluate  and  solve  equation  (4.14). 

HI.    Continue  alternating  between  I  and  II  until  convergence. 
In  the  simulation  experiments  described  in  Section  7  the  above-obtained 
estimate  of  0  occasionally  either  cycled  among  negative  values  or  became 
very  large  and  positive.    In  these  cases  II  was  replaced  by  a  search  of  the 
marginal  likelihood  for  0  with  fixed  {fa). 

5.      ROBUST  HIERARCHICAL  POISSON  REGRESSION  (ROLVPR):    THE 
LOG-STUDENT  f  SUPERPOPULATION 

As  an  alternative  to  the  GALVPR  model,  allow  5  to  have  the  Student  t 
density 

8(8;r\d)=        CW  (5,} 

(1  +  5/ t2)  2 

this  distribution  is  adjustably  longer-tailed  than  is  the  log-Gamma 
distribution  (for  5)  of  the  previous  model,  and  hence  better  represents 
outliers  and  extreme  extra-Poisson  variability.  The  parameter  d  is  the 
"degrees  of  freedom"  for  the  Student  t;  for  the  present  purpose  a  low  value  of 
d  (e.g.,  d  =  3-5)  is  useful.  The  Student-t  model  for  log  failure  rate  was 
introduced  in  Gaver  and  O'Muircheartaigh  (1987).  There  it  was  pointed  out 
that  the  marginalization  step  of  (3.1)  could  be  performed  using  Gauss- 
Hermite  numerical  integration;  see  Naylor  and  Smith  (1982).  In  this  paper 
we  employ  a  variant  of  the  Gauss-Hermite  technique  that  involves  an  initial 
correction  by  Laplace's  method. 

The  procedure  currently  adopted  for  fitting  the  regression  parameters  (1 
in  addition  to  the  Student  t  parameter  i  proceeds  iteratively:  first  explain  as 
much  item-to-item  variability  as  possible  by  suitably  weighted  regression, 
then  alter  the  model  to  approximately  adjust  for  regression  effects  and  apply 
the  methodology  of  Gaver  and  O'Muircheartaigh  (1987)  to  estimate  t2.  This 
value  then  provides  refined  weights  for  a  new  regression.  We  speak  of 
rocking  back  and  forth  between  the  regression  and  latent   variable  stages. 

5.1.   Rocking  Algorithm  when  8j  ~  Student  (|i,  T,  d) 

Here  is  how  the  above  procedure  operates  when  latent  variables  are 
Student  t  so  as  to  represent  adjustably  long-tailed  outlier-prone  regressions;  d 
>  1  is  a  tuning  parameter  with  Var[8]  =  i2d/(d-2)  if  d  >  2. 

a)      Regress  yi(l )=  V^i  ln(sj/tj)  on  V^i  x;;  (5.1) 


Replace  Sj /ti  =  0/tj  by  l/3tj.  Obtain  fl(l). 

b)      In  the  ith  likelihood  component  obtained  by  integrating  out  with 
respect  to  the  5j-distribution, 

L{x\H,s„t,)  =  \-  e-^(X,(z))" CW      M\iz  (5.2) 

[l  +  (z2/r>d)} 


2      * 


where 

lra,(z)  =  *,£  +  Z/  (5.3) 

replace  tj  by  f,e-'       =  f,(l)-     Now  numerically  optimize  (5.2)  by  choice  of 

t=t(2);t(1)  is  a  moment  estimator.     Details  of  the  likelihood  integral 
approximation  and  optimizations  are  furnished  in  Section  6. 


c)      Regress  y  (2)  = 


<\      «„-,     d    ^ 


-l  -l 


+  r2(2) 


Vs.         '     d"2y 


? 


In  (s./f,)  on 


Vs,  rf-2y 


2 


£ 


where  ti  and  Sj  are  the  original  data  values.  Obtain  (3(2). 

d)  In  step  b  above  replace  tj  by  t{e         and  x^  by  x,/K2)  and  again 
numerically  optimize   to  find  r(3). 

e)  Return  to  step  c)  with  ^(3). 

0      Continue  to  convergence  of  |i?(/c)>,{r  (£)J. 

The  above  procedure  converges  rapidly  in  our  experience,  giving  results 
in  close  agreement  with  the  simultaneous  optimization  of  the  likelihood 
with  respect  to  x  and  p.  The  latter  is  a  much  more  computationally 
demanding  procedure  than  is  rocking. 

6.      LIKELIHOOD  COMPUTATION 

An  essential  part  of  the  preceding  algorithm  is  the  numerical  evaluation 
of  likelihoods  of  this  form: 

i=l 

where 
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=  \~e-QAz)dz.  (6.2) 

Under  the  log-linear  model 

lnA,(2)  =  xJ  +  2  (6.3) 

so 

QI(2)  =  A1(2)fI-s1ln^(2)  +  ^-ln[l  +  z2/T2d]  +  lnT/  (6.4) 

omitting  irrelevant  constants.  In  order  to  evaluate  the  integral  in  (6.2) 
approximately  but  reasonably  accurately  we  apply  either  (a)  a  version  of 
Laplace's  method,  in  which  Qj(z)  is  approximated  by  a  quadratic  and 
integrated  explicitly;  alternatively  (b)  apply  a  refined  version  of  (a)  involving 
Gauss-Hermite  integration  of  the  error  resulting  from  the  quadratic 
approximation  to  (6.3).  Here  is  a  sketch  of  the  process.  In  what  follows  we 
will  modify  the  time  to  be  tje*^. 

6.1.   Laplace  Method,  and  a  Refinement. 

To  compute  L,  =  J   e~Q'^z'dz,  the  ith  likelihood  component,  we  begin  by 

approximating  Qj  by  a  quadratic  as  follows.  Since  X\  =  e2,  tj  is  modified  as 
indicated  above,  and 

_Q.(z)  =  _AirI.+silnA1.-—  ln(l  +  22/T2d)-lnr, 


dz 


(d  +  \\  1 


d    J[l  +  z2 /r2d]r2 


(6.5) 


where 


=  -ezt,+s,  -w(z)z/  x 


^(2)  =  ^[i  +  22/T2d] 


d 

is  considered  to  be  a  weight.  Now  -dQj/dz=0  entails  the  equation 
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t. 


S;  + 


d  +  1        \ 


[l  +  22  I  Xld\ 


)\ 


(6.6) 


Equation  (6.6)  may  have  two  solutions.     We  obtain  a  single  reasonable 
approximate  solution  to  (6.6)  as  follows:   An  initial  solution  to  (6.6)  is 

z,{0)  =  \n{s,/t,)  if  s,  >  0, 
zi(0)  =  ln(l/3rl)  if  s,=0; 

other  replacements  for  the  zero  count  situation  are  possible. 
Let  z,  be  the  solution  to  the  equation 


eZ=I[s, -2^(2,(0))/ T2] 


after  one  Newton-Raphson  iteration  starting  at  Zi(0). 

Next  evaluate  an  approximation  to  Q,-(z,-).  First  approximate 


Hence, 


T 


cxw-^i+bk^. 


(6.7) 


(6.8) 


(6.9) 


Finally  approximate  e  '  by  sj/ti  resulting  in  the  approximation 


Write 


QI(2)  =  Q,.(2|.)  +  (z-2|.)2|QI"(zi)  +  i?,(2); 

Laplace's  method  assumes  that  R,(z)  is  negligible  and  hence 


(6.10) 


(6.11) 
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=  e V2«7Vft"fi) 

so  the  log-likelihood 

(6.12) 


(6.13) 

which  can  be  numerically  optimized  by  choice  of  x2  for  fixed  tuning  constant 
value  d  (in  principle  optimization  on  d  can  also  be  included). 
Improved  numerical  results  have  been  achieved  by  writing 


=  e-Q™p/Qr(zi)jy«\-R'Wdzv 


(6.14) 


with 


R*(w)  =  R^2/Qr{z,)w  +  z,),  (6.15) 

Ri(w)  being  defined  by  (6.11).  The  integration  is  then  performed  by  Gauss- 
Hermite  technique,  i.e.,  by  replacing  the  integral  by  a  finite  sum  at  points  wj 
determined  by  zeros  of  the  Hermite  polynomials;  see  Abramowitz  and  Stegun 
(1964).  Experience  has  shown  that  the  above  produce  numerical  results  that 
agree  well  with  other  numerical  methods  such  as  that  of  Naylor  and  Smith 
(1982);  however,  the  unadorned  Laplace,  (6.12),  may  sometimes  be  satisfactory, 
and  is  certainly  more  quickly  computed,  which  is  a  virtue  if  bootstrapping  is 
undertaken. 

Alternative  computational  procedures  exist  and  have  virtues.  The 
Newton-like  iteration  applied  to  the  Gamma  model  of  Section  4  can  be 
adapted  to  the  log-Student  model,  but  we  have  not  undertaken  this  as  yet.  A 
sampling-based  approach  of  Gelfand  and  Smith  (1988)  is  a  natural  option,  but 
at  present  appears  unnecessarily  computer-intensive.  As  will  appear,  even 
the  apparently  crude  rocking  approximation  proposed  leads  to  interesting 
contrasts  between  predictions  made  by  the  conventional  conjugate  Gamma 
and  the  robustifying  Student. 
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7.      NUMERICAL  ILLUSTRATIONS 

In  order  to  illustrate  the  performance  of  the  two  proposed  prediction 
schemes  we  have  performed  extensive  simulations.  These  illustrate  the 
anticipated  comparative  performance  of  GALVPR  and  ROLVPR:  the  latter  is 
often  better  able  to  adapt  to  the  appearance  of  large  outlier  rates  by  refusing  to 
shrink  them  down  as  extensively  as  do  the  former.  The  difference  between 
the  predictions  made  by  the  two  schemes  is  less  noticeable  for  small  rates; 
here  the  behavior  of  the  gamma-based  approach,  GALVPR,  may  actually  be 
superior,  probably  because  of  the  approximations  made  when  implementing 
the  Student  ROLVPR  model.  Improvements  in  the  current  procedure  for 
fitting  the  latter,  e.g.,  when  counts  are  zero,  are  likely  to  show  up  as  reduced 
upward  shrinkages. 

Simulation  Experiment 

The  present  simulations  are  all  based  on  a  group  of  1=20  items.    For  the 
log-linear  rate  of  (2.2)  x\Q,  =  3i  +  ^xj,  and  x,  =  +1  for  i  =  1,  2, ...,  10,  Xj  =  -1  for  i  = 
11,  12,  ...,  20.   In  addition  fii=0.5  and  fc  =  0.1  and  0.3,  while  tj  =  2  throughout. 
For  each  experiment  20  Poisson  rates  were  then  generated  from  the  Student 
model  with  %  =  1.0  and  d  =  5,  and  for  each  rate  a  single  Poisson  data  point  was 
generated  with  mean  X-itj.    These  then  constitute  the  observed  counts  from 
which  predictions  are  made.    Each  prediction  is  viewed  as  a  point  estimate  of 
the  underlying  Poisson  mean  giving  rise  to  the  corresponding  observed 
count;  it  is  a  natural  point  estimate  for  a  future  count.    The  predictions  are 
chosen  to  be  the  means  of  the  posterior  distributions  from  the  GALVPR  and 
ROLVPR  model  specifications,  where  each  model  is  fitted  to  the  data  (20 
counts,  plus  values  of  Xj)  for  the  particular  experiment,  meaning  that  Pi,  i  =  1, 
2,  and  a,  for  GALVPR,  and  x2,  ROLVPR  were  estimated  as  described  earlier. 
These  models  were  actually  fitted  by  two  methods:   (a)  to  all  count  data  in  the 
experiment,  including  that  for  the  item  whose  rate  is  predicted,  and  (b)  to  all 
data,  but  omitting  the  observation  for  the  item  to  be  predicted,  i.e.,  in  cross- 
validation    mode. 

An  illustration  of  a  particular  experimental  outcome,  and  the 
corresponding  predictions  appears  in  Table  I.  Note  that  for  this  particular 
data  set  the  average  mean  square  error  of  ROLVPR  no-cross-validation 
predictions  is  the  smallest.  This  is  not  always  so;  see  the  figures  for 
comparisons  of  mean-squared-errors  for  the  two  shrunken  predictions,  and 
raw  predictions. 
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TABLE  I 

SAMPLE  COMPARISON  OF  RATES  AND  ESTIMATES 

Pl  =  0.5,p2  =  0.1,T  =  l,d  =  5 


Number 

Co- 

Ob- 

True 

Raw 

No  Cross- 

Cross-validation 

variate 

served 

validation 

i 

2fi 

Si 

Ai 

X-i(raw) 

Xi(GA) 

Xi(Stu.t) 

Xj(GA) 

Xi(Stu.t) 

l 

+1 

152 

87.57 

76.00 

74.56 

76.28 

53.56 

73.68 

2 

+  1 

3 

1.44 

1.50 

1.66 

1.62 

1.66 

1.66 

3 

+  i 

2 

0.47 

1.00 

1.17 

1.23 

1.17 

1.26 

4 

+  i 

2 

3.51 

1.00 

1.17 

1.23 

1.17 

126 

5 

+  i 

0 

0.65 

0.00 

0.19 

0.47 

0.21 

0.52 

6 

+  i 

0 

0.16 

0.00 

0.19 

0.47 

0.21 

0.52 

7 

+  i 

5 

2.07 

2.50 

2.64 

2.45 

2.63 

248 

8 

+  i 

3 

2.07 

1.50 

1.66 

1.62 

1.66 

1.66 

9 

+  i 

8 

1.20 

4.00 

4.10 

3.76 

4.10 

3.77 

10 

+  i 

1 

1.11 

0.50 

0.68 

0.85 

0.68 

0.86 

11 

-l 

9 

2.87 

4.50 

4.31 

4.15 

4.28 

4.16 

12 

-1 

2 

0.80 

1.00 

1.10 

1.19 

1.10 

1.19 

13 

-i 

3 

0.43 

1.50 

1.56 

1.59 

1.56 

1.57 

14 

-i 

(1 

1.22 

0.00 

0.18 

0.46 

0.19 

0.52 

15 

-1 

^ 

5.44 

4.50 

4.31 

4.15 

4.28 

4.16 

16 

-l 

7 

3.18 

3.50 

3.40 

3.26 

3.38 

3.27 

17 

-l 

11 

5.40 

5.50 

5.23 

5.10 

5.16 

5.08 

18 

-l 

0 

0.40 

0.00 

0.18 

0.46 

0.19 

0.52 

14 

■i 

3 

0.66 

1.50 

1.56 

1.59 

1.56 

1.57 

20 

l 

0 

0.03 

0.00 

0.18 

0.46 

0.19 

0.52 

MSE's 

7.83 

9.16 

7.34 

58.94 

10.61 

Observed 

True 

Si 

h 

267 

138.43 

0 

0.01 

3 

1.35 

12 

6.55 

0 

0.29 

2 

2.21 

2 

1.32 

5 

3.47 

0 

0.57 

2 

0.98 

7 

2.18 

3 

0.47 

36 

20.25 

4 

1.68 

2 

3.72 

0 

0.39 

6 

2.28 

2 

1.48 

9 

2.40 

10 

4.46 

MSE's 

Raw 


Cross-validation 


TABLE  D 

SAMPLE  COMPARISON  OF  RATES  AND  ESTIMATES 

Pl  =  0.5,p2  =  0.1,X  =  l,d  =  5 

No  Cross- 
validation 

Xj(raw) Xj(GA)     Xj(Stu.t)       Xj(GA)     Xj(Stu.t) 

133.50 
0.00 
1.50 
6.00 
0.00 
1.00 
1.00 
2.50 
0.00 
1.00 
3.50 
1.50 
18.00 
1.00 
2.00 
0.00 
3.00 
1.00 
4.50 
5.00 
2.22 
NOTE:    this  is  an  independent  experiment  from  the  same  setup  as  that  of  Table  1. 


132.11 

137.56 

101.48 

130.66 

0.17 

0.48 

019 

0.51 

1.65 

1.64 

1.65 

1.65 

6.10 

5.62 

6.10 

5.59 

0.17 

0.48 

0.19 

0.51 

1.16 

1.25 

1.16 

1.25 

1.16 

1.25 

1.16 

1.25 

2.64 

2.46 

2.64 

2.48 

0.17 

0.48 

0.19 

0.51 

1.16 

1.25 

1.16 

1.23 

3.52 

342 

3.52 

3.39 

1.60 

1.71 

1.60 

1.71 

1741 

17.48 

16.91 

17.40 

1.12 

1.31 

1.12 

1.32 

2.08 

2.12 

2.08 

2.12 

0.17 

0.51 

0.18 

0.58 

3.04 

3.97 

3.04 

2.96 

1.12 

1.31 

1.12 

1.32 

4.58 

4.30 

4.48 

4.29 

4.96 

4.76 

4.95 

4.77 

2.90 

1.08 

69.51 

4.09 
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Table  II  provides  another  illustration  of  the  estimates'  performance,  this  time 
with  fewer  extreme  outliers.  Figure  1  exhibits  histographs  of  the  mean- 
squared  error  difference,  MSE(ROLVPR)-MSE(GALVPR),  for  100  replications 
of  the  above  specific  simulation.  Note  that  the  advantage  is  in  favor  of 
ROLVPR  in  the  majority  of  the  experiments,  with  an  exceptional  advantage 
displayed  in  some  cases.  Often  it  is  in  a  few  cases  of  exceptionally  large  rates, 
and  counts,  that  ROLVPR  excels.  Figure  2  compares  the  mean-squared  errors 
of  each  shrunken  no-cross  validated  estimator  with  the  corresponding  mean 
square  error  of  the  raw-rate  estimators;  the  raw  rate  estimator  is  simply  the 
count  divided  by  tj  =  2.  For  these  data  sets  the  indications  are  that  ROLVPR 
improves  upon  RAW  most  of  the  time  when  (32=0.1  and  when  p2  =  0.3 
(although  less  decisively),  while  RAW  improves  upon  GALVPR  most  of  the 
time;  neither  victory  is  decisive.  These  results  are  perhaps  not  surprising 
when  one  refers  to  Morris  (1983),  Theorem  1  and  subsequent  discussion.  It 
appears  that  the  convenient  conjugate  can  adapt  to  non-gamma  data  quite 
well  in  many  of  the  present  cases  at  least. 

An  undoubted  disadvantage  of  the  ROLVPR  procedure  is  its  computer 
intensivity:  computation  of  its  estimators  requires  far  more  time  than  does 
GALVPR  because  a  root  must  be  found,  (6.6),  and  a  numerical  integration 
performed.  Search  is  on  for  a  more  tractable  representation  of  a  "robust  g" 
that  permits  analytical  rather  than  computational  evaluation.  The  inverse 
Gaussian  is  a  candidate;  see  Dean  et  al.,  (1989).  Conceivably  such  an  adoption 
will  result  in  better  results  for  small-rate  situations.  Needless  to  say  RAW, 
which  quotes  A.j(RAW)  =  Sj/tj  is  by  far  the  most  economical.  Of  course  it  may 
not  be  used  if  the  covariate  value,  xj,  changes. 
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